For further reading, see [JWHT], pp. 15-36.
Statistical learning is a set of approaches for “learning” the function \(f\) from a data set (i.e., a sample of the random variables).
For a fixed \(\hat{f}\) and \(X\),
\[ \begin{align} E[(Y - \hat{Y})^2] &= E[(f(X) + \epsilon - \hat{f}(X))^2] \\ &= (f(X) - \hat{f}(X))^2 + V(\epsilon) \end{align} \]
For models with a numeric response variable, the mean squared error (MSE) is a common measurement of error:
\[ MSE = \frac{1}{n} \sum_{i = 1}^n \left(y_i - \hat{f}(x_i)\right)^2 \]
A learning model is a way of estimating \(f\).
Let \((x_0, y_0)\) be a fixed testing set observation, and consider \(\hat{f}\) to be a random variable which varies as it is constructed from a randomly-sampled training set, possibly using randomness in its construction.
Generally speaking,
Mathematically, for fixed \((x_0,y_0)\), considering \(\hat{f}\) as a random variable,
\[ \begin{align} E\left[\left(y_0 - \hat{f}(x_0)\right)^2\right] &= \text{Var}\left[\hat{f}(x_0) \right] + \left[\text{Bias}\left(\hat{f}(x_0)\right)\right]^2 + \text{Var}(\epsilon) \\ \text{Ave test MSE} &= \text{Variance} + \text{Bias}^2 + \text{irreducible error} \end{align} \]